Description Using probabilistic modeling approaches, we are modeling the farm level benefits of Integrated Soil Fertility Management (ISFM), a sustainable intensification practice. ISFM holds promise to improve soil fertility and increase productivity on the already exciting land for small-holder farmers in northern Ghana. ISFM is a set of the following components (below), starting with a few components as partial ISFM until complete ISFM (Vanlawe et al., 2010: https://doi.org/10.5367/0000000107911699)

Improved seed As an entry point to ISFM, using improved and resilient seed has its own benefits and costs incurred as they are more expensive than the farmers seed. However farmers are aware that these improved seed yield more than their own seed.
Improved seed:

Improved seed:

Inorganic fertilizer Because soils are poor, they need to be replenished with additional nutrients. Using inorganic fertilizer such as NPK and urea on improved seed is the second layer in ISFM. This second layer of ISFM involves additional costs of buying the fertilizer and additional labor for fertilizer application.

Mineral fertilizer:

Mineral fertilizer:

Organic fertilizer This includes using improved seed and the use of already available organic material such as crop residue and animal manure. However farmers are not always able to apply the amount of organic fertilizer needed for sufficient fertilization because the required amount is usually high.

Organic fertilizer:

Organic fertilizer:

Fertilizer combination Because each type of fertilizer (organic or inorganic) has its unique characteristic and benefit, combining the two will have synergistic benefits that will be more beneficial than applying one alone on improved seed. Combining them also reduces the amount applied for each compared to when they are applied alone.

Fertilizer combination:

Fertilizer combination:

Minimum tillage To avoid the compaction of soil with heavy machinery, tillage is recommended. However in the context of northern Ghana where zero tillage is almost impossible due to long drought and the nature of their soils, minimum tillage is recommended. Here farmers are also advised to avoid using tractors on their soils to tilt the land.

Minimum tillage:

Minimum tillage:

Complete ISFM When all the above components are fully adopted simultaneously.
Complete ISFM:

Complete ISFM:

The 6 different practices are the decision options in a maize and soybean rotation system against the baseline of maize monoculture: 1. Use of Improved seed (As the entry point to ISFM) 2. Improved seed and inorganic fertilizer 3. Improved seed and organic fertilizer 4. Improved seed and fertilizer combination (half organic and inorganic fertilizer) 5. Improved seed and minimum tillage 6. Improved seed, fertilizer combination and minimum tillage (Complete/full ISFM)

The model will answer the following question: Should Agricultural Research and Development (AR4D) actors promote ISFM in northern Ghana?

The impact pathways of ISFM incorporate the costs, the benefits and the risks of each ISFM component for a decade because estimations related to soil are mostly done for a decade so we used this as the basis for our time frame (Guibert, 1999). The impact pathway was translated into a mathematical model and run in Monte Carlo simulation to find the plausible impact of ISFM and the most economic and sustainable ISFM business model.

Monte Carlo simulation for ISFM benefits and trade-offs in northern Ghana

Using the pre-built functions for the 6 ISFM decision options, we run a MC simulation

# Source our model from the model script
source("functions/ISFM_components_function.R")

# Setting seed to ensure consistent results
#each time we run the entire simulation 
set.seed(243) 

ISFM_mc_simulation <- mcSimulation(as.estimate(ISFM_table), 
                              model_function = ISFM_components_function,
                              numberOfModelRuns = 10000,
                              functionSyntax = "plainNames")

Plotting the different outcomes of the 6 decision options to find the most sustainable and economically viable option of ISFM based on the sustainable intensification assessment framework indicators. The most sustainable and beneficial option will be the one that will be further assessed based on the different farmers typology later. We first arrange the Monte Carlo simulation data

#Arranging the Monte Carlo simulation data first
library(tidyverse)
library(tidyr)
library(stringr)

# Extract values from the simulation output to calculate the absolute difference (ISFM level -baseline)

ISFM_impact_data <- ISFM_mc_simulation$y[6:35]

# Pivot the data to long format
ISFM_pivoted_outcome_data <- tidyr::pivot_longer(ISFM_impact_data,
                                                 cols = everything(), 
                                                 names_to = "Outcome with ISFM", 
                                                 values_to = "Absolute_difference")

# Extract outcome type
ISFM_pivoted_outcome_data["Outcome"] <- 
  str_extract(ISFM_pivoted_outcome_data$`Outcome with ISFM`, 
  "productivity|income|environmental|social|human")

# Extract ISFM practice type by removing the identified outcome from the original column name
ISFM_pivoted_outcome_data["Practices"] <-  
  str_remove(ISFM_pivoted_outcome_data$`Outcome with ISFM`, 
  "productivity|income|environmental|social|human")

To find the summary of the ISFM outcome data

library(dplyr)

# Calculate quantiles for the distribution
ISFM_outcome_bounds <- ISFM_pivoted_outcome_data %>%
  group_by(Outcome, Practices) %>%
  summarise(
    lower_bound = quantile(Absolute_difference, 0.05, na.rm = TRUE),  # 5th percentile for lower bound
    upper_bound = quantile(Absolute_difference, 0.95, na.rm = TRUE),  # 95th percentile for upper bound
    min_value = min(Absolute_difference, na.rm = TRUE),               # Minimum value of the distribution
    max_value = max(Absolute_difference, na.rm = TRUE),               # Maximum value of the distribution
    mean_value = mean(Absolute_difference, na.rm = TRUE),             # Mean of the distribution
    sd_value = sd(Absolute_difference, na.rm = TRUE)                  # Standard deviation of the distribution
  ) %>%
  ungroup()

To understand the variation in the outcomes and uncover the stable and most profitable ISFM component we did a stochastic dominance test. Stochastic dominance is a sophisticated comparison of distributions with the dominating distribution being superior in terms of mean, variance, skewness and kurtosis. Stochastic dominance tests help to rank different distributions i.e the superiority of one probability distribution over another https://cran.r-project.org/web/packages/stodom/stodom.pdf. We will only limit to the first oder of dominance where farmers utility is based on maximizing the benefits (more to less). In comparision to the Decision Support package, the stochastic dominance method does not directly generate distributions based on uncertainty, but instead, it helps in comparing existing distributions of outcomes to see which one is preferred, based on certain dominance rules. It provides a way to determine whether one option (or distribution) is “better” than another without the need for specific risk preferences or utility functions. We will only test first order stochastic dominance since the scope of our study is only on utility and not on risk analysis.

# We compute the cumulative probabilities
ISFM_cdf_data <- ISFM_pivoted_outcome_data %>%
  filter(!is.na(Absolute_difference)) %>%
  group_by(Practices, Outcome) %>%
  arrange(Absolute_difference) %>%
  mutate(cdf = row_number() / n())

# We first add the units label and reorder the outcomes and practices

ISFM_cdf_data <- ISFM_cdf_data %>%
  mutate(
    Outcome_label = case_when(
      Outcome == "productivity" ~ "a. Maize productivity (t/ha)",
      Outcome == "income" ~ "b. Maize and soybean income (€/ha)",
      Outcome == "environmental" ~ "c. Soil health value (€/ha)",
      Outcome == "social" ~ "d. Social capital value (€/ha)",
      Outcome == "human" ~ "e. Health benefits (€/ha)",
      TRUE ~ Outcome
    ),
    Outcome_label = factor(Outcome_label, levels = c(
      "a. Maize productivity (t/ha)",
      "b. Maize and soybean income (€/ha)",
      "c. Soil health value (€/ha)",
      "d. Social capital value (€/ha)",
      "e. Health benefits (€/ha)"
    )),
    Practices = str_replace_all(Practices, "_", " "),
    Practices = str_trim(Practices),
    Practices = factor(Practices, levels = c(
      "Improved seed", 
      "Mineral fertilizer", 
      "Organic fertilizer",
      "Fertilizer combination",
      "Minimum tillage", 
      "Complete ISFM"
    )),
    Practices = case_when(
      Practices == "Improved seed" ~ "Improved seed (IG)",
      Practices == "Mineral fertilizer" ~ "IG + Mineral fertilizer",
      Practices == "Organic fertilizer" ~ "IG + Organic fertilizer",
      Practices == "Fertilizer combination" ~ "IG + Fertilizer combination",
      Practices == "Minimum tillage" ~ "IG + Minimum tillage",
      Practices == "Complete ISFM" ~ "Complete ISFM",
      TRUE ~ Outcome), 

  Practices= factor(Practices, levels = c(
  "Improved seed (IG)", 
      "IG + Mineral fertilizer", 
      "IG + Organic fertilizer",
      "IG + Fertilizer combination",
      "IG + Minimum tillage", 
      "Complete ISFM"))
  )


# Plotting the distribution
cdf_ISFM_plot <-
  ggplot(ISFM_cdf_data, aes(x = Absolute_difference, y = cdf, color = Practices)) +
  geom_line(linewidth = 0.7) +
  facet_wrap(~Outcome_label, scales = "free", ncol = 1, drop = FALSE) +
  labs(
    title = " ",
    x = "Absolute difference",
    y = "Cumulative Probability",
    color = "Practices"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    text = element_text(size = 12),
    strip.text = element_text(size = 12, face = "bold"),  
    panel.spacing = unit(1.5, "lines")
  )

ggsave("figures/cdf_ISFM_plot.png", width = 8, height = 10, dpi = 1500)

Based on these identified farmers’ archetypes we present the economic return of complete ISFM (complete adoption) which encompasses the income based on available resources and gender. We represent this as NPVs depending on the identified vital resources for ISFM: land, agricultural inputs/finance, knowledge, and labour.

We first find the summary of NPVs based on gender and resources

gender_range_NPV <- (ISFM_mc_simulation$y)[36:45]

intersectionality_NPV_summary_ranges <- gender_range_NPV %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Value") %>%
  group_by(Variable) %>%
  summarise(
   `5th Percentile` = quantile(Value, probs = 0.05, na.rm = TRUE), #lower bound
    `95th Percentile` = quantile(Value, probs = 0.95, na.rm = TRUE), # Upper bound 
   Min = min(Value, na.rm = TRUE),
    Max = max(Value, na.rm = TRUE),
    Median = median(Value, na.rm = TRUE)
  )

Then we plot the Net present value

library(RColorBrewer)

#Variable order and labels
ordered_vars <- c(
  "NPV_men_ISFM_income",
  "NPV_women_ISFM_income",
  "NPV_men_ISFM_income_land_based",
  "NPV_women_ISFM_income_land_based", 
  "NPV_men_ISFM_income_inputs_based",
  "NPV_women_ISFM_income_inputs_based", 
  "NPV_men_ISFM_income_knowledge_based",
  "NPV_women_ISFM_income_knowledge_based",
  "NPV_men_ISFM_income_labor_based",
  "NPV_women_ISFM_income_labor_based"
)

new_labels <- c(
  "Men's income",
  "Women's income",
  "Men’s land-based income",
  "Women’s land-based income",
  "Men’s input-based income",
  "Women’s input-based income",
  "Men’s knowledge-based income",
  "Women’s knowlegde-based income",
  "Men’s labour-based income",
  "Women’s labour-based income"
)
names(new_labels) <- ordered_vars

# Extracting and reshaping the data
ISFM_data_violin <- as.data.frame(ISFM_mc_simulation$y)[, ordered_vars]

ISFM_data_long <- ISFM_data_violin %>%
  pivot_longer(cols = everything(), names_to = "Scenario", values_to = "NPV") %>%
  mutate(Scenario_label = factor(new_labels[Scenario], levels = new_labels))

# Meta data for gender and resource
meta <- tibble::tibble(
  Scenario = ordered_vars,
  Gender   = c("Men","Women","Men","Women","Men","Women","Men","Women","Men","Women"),
  Resource = c("Total","Total","Land","Land","Inputs","Inputs","Knowledge","Knowledge","Labour","Labour"),
  Label    = new_labels
)

ISFM_plot_df <- ISFM_data_long %>%
  left_join(meta, by = c("Scenario", "Scenario_label" = "Label")) %>%
  mutate(
    Gender   = factor(Gender, levels = c("Men","Women")),
    Resource = factor(Resource, levels = c("Total","Land","Inputs","Knowledge","Labour"))
  )

resource_colors <- c(
  "Total"     = "#bdbdbd",  
  "Land"      = "#ff7f00",  
  "Inputs"    = "#4daf4a",  
  "Knowledge" = "#e41a1c",  
  "Labour"    = "#377eb8"  
)

# new names for the legend
resource_labels <- c(
  "Total"     = "With all resources",
  "Land"      = "With land constraints",
  "Inputs"    = "With inputs constraints",
  "Knowledge" = "With knowledge constraints",
  "Labour"    = "With labour constraints"
)

#Creating the violin plot
p_side_by_side <- ggplot(ISFM_plot_df, aes(x = Resource, y = NPV, fill = Resource)) +
  geom_violin(trim = FALSE, scale = "width", color = "black") +
  geom_boxplot(width = 0.1, outlier.shape = NA, fill = "white") +
  facet_wrap(~ Gender, ncol = 2) +
  scale_fill_manual(
    values = resource_colors,
    labels = resource_labels,
    name = "Scenario"
  ) +
  theme_minimal(base_size = 13) +
  labs(
    x = NULL,
    y = "Net Present Value (€/ha)"
    
  ) +
  theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.x = element_blank(),
    legend.position = "right",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 11),
    strip.text = element_text(size = 13, face = "bold"),
    panel.spacing = unit(1, "lines")
  )

# Save figure
ggsave("figures/new_NPV_ISFM_gender_resource.png", 
       p_side_by_side, width = 8, height = 5, dpi = 1500)

Baseline and ISFM components Cash flow projection

library(patchwork)

cashflow_var_name_plot <- c(
  "Baseline", 
  "Improved_seed", 
  "Mineral_fertilizer", 
  "Organic_fertilizer",
  "Fertilizer_combination",
  "Minimum_tillage", 
  "Complete_ISFM"
)

# Removing underscores on the labels
readable_labels <- gsub("_", " ", cashflow_var_name_plot)
names(readable_labels) <- cashflow_var_name_plot

# new names for the legend
readable_labels <- c(
   "Baseline" = "Baseline", 
  "Improved_seed"= "Improved seed (IG)",
  "Mineral_fertilizer"= "IG + Mineral fertilizer", 
  "Organic_fertilizer" = "IG + Organic fertilizer",
  "Fertilizer_combination"= "IG + Fert. combination",
  "Minimum_tillage" = "IG + Minimum tillage", 
  "Complete_ISFM" = "IG + complete ISFM"
)


# Left quadrant with the Baseline only
baseline_cashflow <- plot_cashflow(
  mcSimulation_object = ISFM_mc_simulation,
  cashflow_var_name   = "Baseline",
  x_axis_name         = "Years",
  y_axis_name         = "Cash flow (€/ha)",
  color_5_95          = "blue4",
  color_25_75         = "lightblue",
  color_median        = "red",
  base_size           = 35
) +
  scale_x_continuous(breaks = c(1, 5, 10), expand = expansion(mult = c(0, 0))) +
  scale_y_continuous(limits = c(-300, 300)) +
  theme(
    strip.text.x = element_text(size = 35, face = "bold"),
    axis.text.x  = element_text(size = 40, face = "bold"),
    axis.text.y  = element_text(size = 40),
    axis.title.x = element_text(size = 40, face = "bold"),
    axis.title.y = element_text(size = 40, face = "bold"),
    legend.position = "none",                         
    plot.margin = margin(t = 20, r = 0, b = 20, l = 20, unit = "pt") +
                  margin(t = 0, r = 1.5, b = 0, l = 0, unit = "cm") 
  )

# Right quadrant with the 6 ISFM implementation levels
ISFM_levels <- setdiff(cashflow_var_name_plot, "Baseline")

ISFM_cashflow <- plot_cashflow(
  mcSimulation_object = ISFM_mc_simulation,
  cashflow_var_name   = ISFM_levels,
  x_axis_name         = "Years",
  y_axis_name         = "Cash flow (€/ha)",
  color_5_95          = "blue4",
  color_25_75         = "lightblue",
  color_median        = "red",
  base_size           = 35
) +
  scale_x_continuous(breaks = c(1, 5, 10), expand = expansion(mult = c(0, 0))) +
  scale_y_continuous(limits = c(-1000, 35000)) +
  theme(
    strip.text.x = element_text(size = 35, face = "bold"),
    axis.text.x  = element_text(size = 40),
    axis.text.y  = element_text(size = 40),
    axis.title.x = element_text(size = 40, face = "bold"),
    axis.title.y = element_text(size = 40, face = "bold"),
    plot.margin  = margin(t = 15, r = 15, b = 15, l = 15, unit = "pt") +
                   margin(t = 0, r = 0, b = 0, l = 2.0, unit = "cm"),
    legend.position = "right",
    panel.spacing.x = unit(1.5, "cm"), 
    panel.spacing.y = unit(1, "cm")   
  )
  
if ("FacetWrap" %in% class(ISFM_cashflow$facet)) {
  ISFM_cashflow$facet$params$labeller <- as_labeller(readable_labels)
  ISFM_cashflow$facet$params$ncol <- 3
}

#Combining the 2 quadrant
cashflow_combined_plot <- (
  baseline_cashflow + ISFM_cashflow +
  plot_layout(widths = c(1, 3), guides = "collect")
) &
  labs(x = "Years", y = "Cash flow (Euro/ha)") &
  theme( text = element_text(size = 40, family = "serif"),
    legend.position = "right",
    legend.title  = element_text(size = 40),
    legend.text   = element_text(size = 40),
    strip.text.x  = element_text(size = 40, face = "bold")
  )

ggsave("figures/ISFM_components_cashflow_plot.png",
       plot   = cashflow_combined_plot,
       width  = 33, height = 15, dpi = 1000)

Projection to Latent Structures (PLS)

We use Projection to Latent Structures (PLS) model to assess the correlation strength and direction for model variables and outcome variables. The Partial Least Squares is fitted with the orthogonal scores algorithm with pls::plsr

#For ISFM implimentation level

ISFM_implimentation_level_income<- ISFM_mc_simulation$y %>%
  select(7, 12, 17, 22, 27, 32)
library(dplyr)

#VIP function 
calculate_vip <- function(pls_model) {
  W <- pls_model$projection      
  T <- pls_model$scores
  p <- nrow(W)
  SS <- colSums(T^2) * colSums(pls_model$coefficients^2)
  vip <- numeric(p)
  for (j in seq_len(p)) {
    vip[j] <- sqrt(p * sum((W[j, ]^2) * SS) / sum(SS))
  }
  data.frame(variable = rownames(W), VIP = vip, stringsAsFactors = FALSE)
}

# ISFM implementation levels to evaluate 
outcomes <- c(
  "Improved_seed_income",
  "Mineral_fertilizer_income",
  "Organic_fertilizer_income",
  "Fertilizer_combination_income",
  "Minimum_tillage_income",
  "Complete_ISFM_income"
)

plsr_results <- lapply(outcomes, function(y) {
  list(
    outcome = y,
    pls     = plsr.mcSimulation(ISFM_mc_simulation, y)
  )
})

#Extracting VIP scores and the coefficients
all_data <- dplyr::bind_rows(lapply(plsr_results, function(res) {
  vip_df  <- calculate_vip(res$pls) |>
    dplyr::mutate(outcome = res$outcome)

  coef_df <- as.data.frame(res$pls$coefficients, stringsAsFactors = FALSE)
  coef_df$variable <- rownames(coef_df)
  names(coef_df)[1] <- "coefficient"

  dplyr::inner_join(vip_df, coef_df, by = "variable") |>
    dplyr::filter(variable %in% ISFM_table$variable,
                  VIP >= 1, VIP <= 10) |>
    dplyr::mutate(
      direction      = ifelse(coefficient > 0, "Positive", "Negative"),
      abs_VIP        = abs(VIP),
      variable_label = stringr::str_replace_all(variable, "_", " ")
    )
}))

#Grouping VIP scores into Low/Medium/High
all_data <- all_data |>
  dplyr::mutate(
    vip_category = cut(
      VIP,
      breaks = c(1, 2, 5, 10),
      labels = c("Low", "Medium", "High"),
      include.lowest = TRUE,
      right = TRUE
    )
  )

# Removing underscores on practices names 
practice_labels <- c(
  Improved_seed_income           = "Improved seed",
  Mineral_fertilizer_income      = "Mineral fertilizer",
  Organic_fertilizer_income      = "Organic fertilizer",
  Fertilizer_combination_income  = "Fertilizer combination",
  Minimum_tillage_income         = "Minimum tillage",
  Complete_ISFM_income           = "Complete ISFM"
)

all_data <- all_data |>
  dplyr::mutate(outcome_label = dplyr::recode(outcome, !!!practice_labels))

#Plotting the pls in bubble plot format
pls_bubble_plot <- ggplot2::ggplot(
  all_data,
  ggplot2::aes(x = VIP, y = reorder(variable_label, VIP), size = vip_category, color = direction)
) +
  ggplot2::geom_point(alpha = 0.8) +
  ggplot2::scale_color_manual(values = c("Positive" = "blue", "Negative" = "red")) +
  ggplot2::scale_size_manual(
    name   = "VIP Category",
    values = c("Low" = 2, "Medium" = 5, "High" = 10),
    guide  = ggplot2::guide_legend(override.aes = list(alpha = 1))
  ) +
  ggplot2::facet_wrap(~ outcome_label, ncol = 3, scales = "free_y") +
  ggplot2::labs(
    title = NULL,
    x = "VIP Score",
    y = "Input Variables",
    color = "Effect Direction"
  ) +
  ggplot2::theme_minimal(base_size = 14, base_family = "Times New Roman") +
  ggplot2::theme(
    strip.text        = ggplot2::element_text(size = 14, face = "bold"),
    strip.background  = ggplot2::element_blank(),
    panel.spacing     = grid::unit(8, "mm"),
    legend.position   = "bottom",
    legend.box        = "horizontal",
    legend.title      = ggplot2::element_text(size = 14),
    legend.text       = ggplot2::element_text(size = 14),
    axis.text.x       = ggplot2::element_text(size = 14),
    axis.text.y       = ggplot2::element_text(size = 14),
    axis.title.x      = ggplot2::element_text(size = 14),
    axis.title.y      = ggplot2::element_text(size = 14)
  )

ggsave("figures/bubbleplot_ISFM_levels_PLS.png",
  plot     = pls_bubble_plot,
  width    = 12,   
  height   = 16,
  dpi      = 500
)

Pls of ISFM and interectionality results

Here we only need to use the variables related to complete ISFM since it emerged to be the best ISFM. We remove variables for other ISFM so the correlation does not read the other ISFM components

#To get the names of outcomes needed to run some posthoc test 
ISFM_intersectionality_income <- names(ISFM_mc_simulation$y)[36:45]
# Removing variables that are not used in the intersectionality analysis
#These variables are related to other components of ISFM, while intersectionality analysis focused on the most beneficial ISFM (Complete ISFM)

pls_ISFM_input_variables <- ISFM_table %>%
  filter(!variable %in% c(
    "decay_rate", "traditional_maize_seed_price", "maize_yield_statusquo",
    "maize_yield_component1", "maize_yield_component2",
    "maize_yield_component3", "maize_yield_component4",
    "maize_yield_component5", "soybean_yield_component1",
    "soybean_yield_component2", "soybean_yield_component3", 
    "soybean_yield_component4", "soybean_yield_component5",
    "nutrient_partial_balance_value_organic_fertilizer",     
    "nutrient_partial_balance_value_mineral_fertilizer"
  ))

#function to calculate VIP scores 
calculate_vip <- function(pls_model) {
  W <- pls_model$projection    
  T <- pls_model$scores     
  p <- nrow(W)
  SS <- colSums(T^2) * colSums(pls_model$coefficients^2)
  vip <- numeric(p)
  for (j in seq_len(p)) {
    vip[j] <- sqrt(p * sum((W[j, ]^2) * SS) / sum(SS))
  }
  data.frame(variable = rownames(W), VIP = vip, stringsAsFactors = FALSE)
}

# Run all PLSR simulations
plsr_results <- list(
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_women_ISFM_income",ncomp = 1), category = "All Resources",          gender = "Women"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_women_ISFM_income_land_based", ncomp = 1), category = "Land", gender = "Women"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_women_ISFM_income_inputs_based", ncomp = 1), category = "Inputs",  gender = "Women"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_women_ISFM_income_knowledge_based", ncomp = 1), category = "Knowledge", gender = "Women"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_women_ISFM_income_labor_based",  ncomp = 1), category = "Labour", gender = "Women"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_men_ISFM_income", ncomp = 1), category = "All Resources",          gender = "Men"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_men_ISFM_income_land_based",     ncomp = 1), category = "Land",    gender = "Men"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_men_ISFM_income_inputs_based",   ncomp = 1), category = "Inputs",  gender = "Men"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_men_ISFM_income_knowledge_based", ncomp = 1), category = "Knowledge", gender = "Men"),
  list(pls = plsr.mcSimulation(ISFM_mc_simulation, "NPV_men_ISFM_income_labor_based",   ncomp = 1), category = "Labour",    gender = "Men")
)

# Extract VIP & coefficients and create a clean label for the practices that will be used for plotting
all_data <- bind_rows(lapply(plsr_results, function(res) {
  vip_df  <- calculate_vip(res$pls)
  coef_df <- as.data.frame(res$pls$coefficients, stringsAsFactors = FALSE)
  coef_df$variable <- rownames(coef_df)
  colnames(coef_df)[1] <- "coefficient"
  
  vip_df %>%
    inner_join(coef_df, by = "variable") %>%
    filter(variable %in% pls_ISFM_input_variables$variable,
           VIP >= 1,
           VIP <= 10) %>%
    mutate(
      direction       = ifelse(coefficient > 0, "Positive", "Negative"),
      abs_VIP         = abs(VIP),
      category        = res$category,
      gender          = res$gender,
      variable_label  = str_replace_all(variable, "_", " ")  
    )
}))

# Grouping VIP scores into Low/Medium/High
all_data <- all_data %>%
  mutate(
    vip_category = cut(
      VIP,
      breaks = c(1,2, 5, 10),
      labels = c("Low","Medium","High"),
      include.lowest = TRUE
    )
  )

# Plotting pls
pls_bubble_plot_intersectionality <- ggplot(all_data, aes(
    x    = VIP,
    y    = reorder(variable_label, VIP),  
    size = vip_category,
    color= direction
  )) +
  geom_point(alpha = 1) +
  scale_color_manual(values = c("Positive" = "blue", "Negative" = "red")) +
  scale_size_manual(
    name   = "VIP Category",
    values = c("Low" = 10, "Medium"= 15, "High" = 20)
  ) +
  guides(
    color = guide_legend(
      override.aes = list(size = 10)
    )
  )+
  facet_grid(gender ~ category, scales = "free_y", space = "free_y") +
  labs(
    title = NULL,
    x     = "VIP Score",
    y     = "Input Variables",
    color = "Effect Direction"
  ) +
  theme_minimal(base_size = 45) +
  theme(
    strip.text.x      = element_text(size = 50, face = "bold"),
    strip.text.y      = element_text(size = 55, face = "bold"),
    strip.background  = element_blank(),
    panel.spacing.y   = unit(2.0, "cm"),
    legend.position.inside= c(1.05, 0.5),
    legend.justification  = c(0, 0.5),
    legend.box.margin     = margin(0, 20, 0, 0),
    legend.background     = element_rect(fill = "white"),
    axis.text.y           = element_text(size = 55), 
    axis.text.x  = element_text(size = 55),
    axis.title.x = element_text(size = 55, face = "bold"),
    axis.title.y = element_text(size = 55, face = "bold"),
    plot.margin  = margin(t = 15, r = 15, b = 15, l = 15, unit = "pt") +
                   margin(t = 0, r = 0, b = 0, l = 2.0, unit = "cm"),
  )

ggsave("figures/bubbleplot_ISFM_PLS_by_Resource_and_Gender.png",
       plot   = pls_bubble_plot_intersectionality,
       width  = 45,
       height = 30,
       dpi    = 500)